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Abstract. We consider optimal experimental design (OED) for Bayesian nonlinear inverse problems governed 
by partial differential equations (PDEs) under model uncertainty. Specifically, we consider inverse 
problems in which, in addition to the inversion parameters, the governing PDEs include secondary 
uncertain parameters. We focus on problems with infinite-dimensional inversion and secondary pa- 
rameters and present a scalable computational framework for optimal design of such problems. The 
proposed approach enables Bayesian inversion and OED under uncertainty within a unfied framework. 
We build on the Bayesian approximation error (BAE) framework, to incorporate modeling uncer- 
tainties in the Bayesian inverse problem, and methods for A-optimal design of infinite-dimensional 
Bayesian nonlinear inverse problems. Specifically, a Gaussian approximation to the posterior at the 
maximum a posteriori probability point is used to define an uncertainty aware OED objective that 
is tractable to evaluate and optimize. In particular, the OED objective can be computed at a cost, 
in the number of PDE solves, that does not grow with the dimension of the discretized inversion 
and secondary parameters. The OED problem is formulated as a binary bilevel PDE constrained 
optimization problem and a greedy algorithm, which provides a pragmatic approach, is used to find 
optimal designs. We demonstrate the effectiveness of the proposed approach for a model inverse 
problem governed by an elliptic PDE on a three-dimensional domain. Our computational results also 
highlight the pitfalls of ignoring modeling uncertainties in the OED and/or inference stages. 


Key words. Optimal experimental design, sensor placement, Bayesian inverse problems, model uncertainty, 
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1. Introduction. Models governed by partial differential equations (PDEs) are common 
in science and engineering applications. Such PDE models often contain parameters that need 
to be estimated using observed data and the model. This requires solving an inverse problem. 
The quality of the estimated parameters is influenced significantly by the quantity and quality 
of the measurement data. Therefore, optimizing the data acquisition process is crucial. This 
requires solving an optimal experimental design (OED) problem [7, 13,37]. In the present work, 
we focus on inverse problems in which measurement data are collected at a set of sensors. In 
this case, OED amounts to finding an optimal sensor placement. In this context, OED is 
especially important when only a few sensors can be deployed. 

While some parameters in the governing PDEs can be estimated by solving an inverse 
problem, often there are additional uncertain model parameters that are not being estimated. 
These parameters might be too costly or impossible to estimate. We call the uncertain param- 
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eters that are being estimated in an inverse problem the inversion parameters and refer to 
the additional uncertain parameters as the secondary parameters. Such secondary parameters 
have also been referred to as auxiliary parameters, latent parameters, or nuisance parameters 
in the literature. When solving inverse problems with secondary parameters with significant 
uncertainty levels, both the parameter estimation and data acquisition processes need to be 
aware of such uncertainties. In this article, we present a computational framework for optimal 
design of nonlinear Bayesian inverse problems governed by PDEs under model uncertainty. 

Uncertainties in mathematical models can be divided into two classes: reducible and ir- 
reducible [35]. The reducible uncertainties are epistemic uncertainties that can be reduced 
via statistical parameter estimation. On the other hand, irreducible uncertainties are either 
aleatoric uncertainties inherent to the model that are impossible to reduce or are epistemic 
uncertainties that are impractical or impossible to reduce. In the present work, we focus on 
the case of irreducible uncertainties. 

We consider models of the form 


(1.1) y =G(m,€) +n, 


where y is a vector of measured data, G is a PDE-based model, m and € are uncertain 
parameters, and ņ is a random vector that models measurement noise. Herein, m is the 
inversion parameter which we seek to infer, and € is a secondary uncertain model parameter. 
We assume the uncertainty in € to be irreducible. The parameters m and € are assumed to be 
independent random variables that take values in infinite-dimensional real separable Hilbert 
spaces . and %, respectively. Moreover, G is assumed to be nonlinear in both m and é. The 
methods presented in this article enable computing optimal experimental designs in such a 
way that the uncertainty in the secondary parameters is accounted for. 

Related work. In the recent years, there have been numerous research efforts directed at 
OED in inverse problems governed by PDEs. See the review [1], for a survey of the recent 
literature in this area. There has also been an increased interest in parameter inversion and 
design of experiments in systems governed by uncertain forward models; see e.g., [6, 14, 24, 
26, 32,34] for a small sample of the literature addressing inverse problems under uncertainty. 
Methods for OED in such inverse problems have been studied in [5,11,18,27]. The works [5,27] 
concern optimal design of infinite-dimensional Bayesian linear inverse problems governed by 
PDEs. Specifically, [27] considers design of linear inverse problems governed by PDEs with 
irreducible sources of model uncertainty. On the other hand, [5] targets OED for linear inverse 
problems with reducible sources of uncertainty. The efforts [11,18] focus on inverse problems 
with finite- and low-dimensional inversion and secondary parameters. These articles devise 
sampling approaches for estimating the expected information gain in such problems. The 
setting considered in [18] is that of inverse problems with reducible uncertainties. The approach 
in [11] employs a small noise approximation that applies to problems with nuisance parameters 
that have small uncertainty levels. 

Our approach. We focus on Bayesian nonlinear inverse problems governed by PDEs with 
infinite-dimensional inversion and secondary parameters. Traditionally, when solving such in- 
verse problems all the secondary model parameters are fixed at their nominal values and the 
focus is on the estimation of the inversion parameters. Considering (1.1), this amounts to 
using the approrimate model F(m) := G(m,€), where É is some nominal value. In Section 2, 
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we consider simpler instances of (1.1) and illustrate the role of model uncertainty in Bayesian 
inverse problems and the importance of accounting for such uncertainties in parameter inver- 
sion and OED. In that section, we also make a connection between the approach taken in the 
present work and the approach for design of linear inverse problems under uncertainty in [5]. 
The discussion in Section 2 motivates our approach for incorporating secondary uncertainties 
in nonlinear Bayesian inverse problems and the corresponding OED problems. This is done 
using the Bayesian approximation error (BAE) approach [24,25]; see Section 3. 

Subsequently, we build on methods for optimal design of infinite-dimensional nonlinear 
inverse problems [1,4,38] to derive an uncertainty aware OED objective. Specifically, we fol- 
low an A-optimal design strategy where the goal is to obtain designs that minimize average 
posterior variance. To cope with the non-Gaussianity of the posterior, we rely on a Gaussian 
approximation to the posterior. This enables deriving approximate measures of posterior un- 
certainty that are tractable to optimize for infinite-dimensional inverse problems; see Section 4. 
We then present two approaches for formulating and computing the OED objective; see Sec- 
tion 5. The first approach uses Monte Carlo trace estimators and the second one formulates 
the OED problem as an eigenvalue optimization problem. In each case, the OED problem is 
formulated as a bilevel binary PDE-constrained optimization problem. In both proposed ap- 
proaches, the cost of computing the OED objective, in terms of the number of PDE solves, is 
independent of the dimensions of discretized inversion and secondary parameters. This makes 
these approaches suitable for large-scale applications. In the present work, we rely on a greedy 
approach to solve the resulting optimization problems. As discussed in Section 5, a greedy 
algorithm is especially suited for the formulation of the OED problem in Section 5.2.2, as a 
binary PDE-constrained eigenvalue optimization problem. 

We elaborate the proposed approach in the context of a model inverse problem governed 
by an elliptic PDE in a three-dimensional domain; see Section 6. This inverse problem, which 
is motivated by heat transfer applications, concerns estimation of a coefficient field on the 
bottom boundary of the domain, using sensor measurements of the temperature on the top 
boundary. The secondary uncertain parameter in this inverse problem is the log-conductivity 
field, which is modeled as a random field on the three-dimensional physical domain. Our 
computational results in Section 7 demonstrate the effectiveness of the proposed strategy in 
computing optimal sensor placements under model uncertainty. We also systematically study 
the drawbacks of ignoring the uncertainty in the Bayesian inversion and experimental design 
stages. These studies illustrate the fact that ignoring uncertainty in OED or inference stages 
can lead to inferior designs and highly inaccurate results in parameter estimation. 

Contributions. The contributions of this article are as follows: (1) We present an uncer- 
tainty aware formulation of the OED problem, uncertainty aware OED objectives along with 
scalable methods for computing them, and an extensible optimization framework for com- 
puting optimal designs. These make OED for nonlinear Bayesian inverse problems governed 
by PDEs with infinite-dimensional inversion and secondary parameters feasible. Additionally, 
the proposed approach enables Bayesian inversion and OED under uncertainty within a un- 
fied framework. (2) We elaborate the proposed approach for an inverse problem governed by 
an elliptic PDE, on a three-dimensional domain, with infinite-dimensional inversion and sec- 
ondary parameters. This is used to elucidate the implementation of our proposed approach 
for design of inverse problems governed by PDEs under uncertainty. (3) We present compre- 
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hensive computational studies that illustrate the effectiveness of the proposed approach and 
also the importance of accounting for modeling uncertainties in both parameter inversion and 
experimental design stages. (4) By considering simpler instances of (1.1), in Section 2, we 
present a systematic study of the role of model uncertainty in Bayesian inverse problems and 
the importance of accounting for such uncertainties in parameter inversion and OED. That 
study also reveals a connection between the BAE-based approach taken in the present work 
and the method for OED in linear inverse problems under uncertainty in [5]. 

The developments in this article point naturally to a number of extensions of the presented 
methods. We discuss such issues in Section 8, where we present our concluding remarks, discuss 
potential limitations of the presented approach, as well as opportunities for future extensions. 


2. Motivation and overview. In this section, we motivate our approach for OED under 
uncertainty and set the stage for the developments in the rest of the article. After a brief 
coverage of requisite notation and preliminaries in Section 2.1, we begin our discussion in Sec- 
tion 2.2 by considering a simple form of (1.1) where G is linear in m and €. This facilitates an 
intuitive study of the role of model uncertainty in Bayesian inverse problems and OED. We 
then consider nonlinear models of varying complexity in Section 2.3 to motivate our approach 
for design of inverse problems governed by nonlinear models of the form (1.1). 


2.1. Preliminaries. In this article, we consider inversion and secondary parameters that 
take values in infinite-dimensional Hilbert spaces. For a Hilbert space #, we denote the 
corresponding inner product by (-, -) 7 and the associated induced norm by ||-||ye; i-e., || Ilze := 


a For Hilbert spaces Hı and #2, L (H1, #2) denotes the space of bounded linear 
transformations from 4) to H2. The space of bounded linear operators on a Hilbert space 50 
is denoted by £ (YC), and the subspace of bounded selfadjoint oprators is denoted by LIM (A£). 
We let £°™*(F€) denote the set of bounded positive selfadjoint operators. The subspace of 
SL (H) consisting of trace-class operators is denoted by L1(4€), and the subspace of selfadjoint 
trace-class operators is denoted by L£P™ (Xe). Also, the sets of positive and strictly positive 
selfadjoint trace-class operators are denoted by £59" (Je) and LPP (JE), respectively. 

Throughout the article, N(a,C) denotes a Gaussian measure with mean a and covariance 
operator C. For a Gaussian measure on an infinite-dimensional Hilbert space C, the covari- 
ance operator C is required to be in LP (IE). Herein, we consider non-degenerate Gaussian 
measures; i.e., we assume that C € £P™*™* (Je). For further details on Gaussian measures, we 
refer to [15,36]. Also, when considering measures on Hilbert spaces, we equip these spaces 
with their associated Borel sigma algebra. Throughout the article, for notational convenience, 
we suppress this choice of the sigma-algebra in our notations. 

The adjoint of a linear transformation A E€ L (H1, 42), where Hı and #2 are (real) 


Hilbert spaces, is denoted by A*. Recall that A* € L (H2, H1) and 
(Avi, v2) ye, = (v1, A* v2) ye, 5 for all vy E€ H1, v2 E Hd. 


We also recall the following basic result regarding affine transformations of Gaussian ran- 
dom variables. Let X be an %-valued Gaussian random variable with law N(a,C), A € 
F (H1, H2), and b E€ H2. Then, the random variable AX + b is an /€2-valued Gaussian ran- 
dom variable with law M (Aa + b, ACA*); see [15] for details. 
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2.2. Linear models. Consider the model 
(2.1) y=Sm4+TE+N, 


where y € R? denotes measurement data, m € M is the inversion parameter, € € % is the 
secondary uncertain parameter, and 7 is the measurement noise vector. (The spaces M and 
% are as described in the introduction.) This type of model, which was considered in [5], may 
correspond to inverse problems governed by linear PDEs with uncertainties in source terms 
or boundary conditions. We assume S € L(M,Rº) and T € £(%,R*). Moreover, we assume 
that n ~ N (0, Thoise) and that 7 is independent of m and £. We consider a Gaussian prior law 
lpr = N(mpr, Cpr) for m, and for the purpose of this illustrative example, let the secondary 
model uncertainty € be distributed according to a Gaussian pe = M (é : Ce). Also, we assume 
Thoise = 071, with o? denoting the noise level. 

Incorporating model uncertainty in the inverse problem. For each fixed realization of €, the 
estimation of m from (2.1) is a standard problem. This also follows the common practice of 
fixing additional model parameters at some nominal values before solving the inverse problem. 
However, it is possible to account for the model uncertainty in this process. To see this, suppose 
we fix € at € and consider the approximate model F(m) = Sm + TE. Note that 


Sm+TE =Sm+TE+T(E-— É). 
—_—— —— 


accurate model F(m) error € 


In this case, the approximation error € = T(£ — €) has a Gaussian law, € ~ N (0, TCeT*). 
Hence, we may rewrite (2.1) in terms of the approximate model F as follows: 


(2.2) y=F(m)+y, 


where v = e + n denotes the total error. In the present setting, v ~ N(0,TCeT* + Poise). 
Note that we have incorporated the uncertainty due to € in the error covariance matrix.! 
Since we have Gaussian prior and noise models and F is affine, the posterior is also Gaussian 
with analytic formulas for its mean and covariance operator; see e.g., [36]. In particular, the 
posterior covariance operator is given by 


(2.3) Cpost = (S*T,'S 405) = (S*(TCeT* aft Tnoise) So ce ey mee 


Considering, for example, the A-optimality criterion tr(Cpost) as a measure of posterior uncer- 
tainty, (2.3) illustrates how the uncertainty due to use of an approximate model impacts the 
posterior uncertainty. 

Interplay between measurement error and approximation error. Note that TCgT* € £°9™* (R2). 
This operator admits a spectral decomposition VAV", where V is an orthogonal matrix of 
eigenvectors and A is a diagonal matrix with the eigenvalues on it diagonal. Thus, the total 
error covariance matrix can be written as T, = VAV! 4+ 07I => ACY + o°)v;v}. Therefore, 


‘If the approximate model F was defined by fixing € at a point different from €, then the error term v 
would have nonzero mean. 
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the modes for which A; < o? may be ignored. To observe the impact of model uncertainty on 
individual observations, we note that 


Var{vi} = ei Te; =X (+06) elv vj)? ; HE Terço 
J 
where e;'s are the standard basis vectors in Rº. Thus, we see that only large eigenvalues 
contribute significantly to the total error in the ith measurement. 


We can also consider the interplay between the spectral representation of the error co- 
variance and the posterior covariance operator. Specifically, Cpost = (S*T7'S + C) = 


COTS +1) ig? with S = sey. Thus, letting E; = vjvo}, j=1,...,d, we can write 
q Er äi 
Cpost = = eS (A; + o? DIS E;S + 1] Gr 
j=l 


Note that in the directions corresponding to very large eigenvalues the measurements will 
have a negligible impact on posterior uncertainty. 

The above discussion indicates that model uncertainty cannot be ignored in the inverse 
problem, especially when the model uncertainty overwhelms measurement noise. The latter 
is also important in design of experiments. Specifically, some of the measurements might be 
completely useless due to large amount of model uncertainty associated to the corresponding 
measurements. This information needs to be accounted for in the OED problem to ensure 
only measurements that are helpful in reducing posterior uncertainty are selected. 

Connection to post-marginalization. If we consider € as a reducible uncertainty that is being 
estimated along with m and pe as the corresponding prior law, then (2.3) can be shown to be 
the covariance operator of the marginal posterior law of m. To see this we first note that 


TY i T* Ee 


(2.4) (TCeT* + Tis” Sto EC Ea TTo 
This relation can be derived by following a similar calculation as the one in [36, p. 536].? 
Then, we substitute (2.4) in (2.3) to obtain 

Drs. 


noise 


Cpost = [Coy + ST iiss S Ta 


noise noise 


TE; TT- 


noise 


This form of Cpost is the same as the marginal posterior covariance operator of m, as noted 
in [5, Equation 2.4]. This observation indicates the equivalence of the BAE-based approach and 
the approach of [5] in the case of linear Gaussian inverse problems with reducible uncertainty. 
While BAE involves pre-marginalization, the approach of [5] considers post-marginalization. 


2.3. Nonlinear models. The linear model (2.1) can be generalized in the following ways: 


(2.5) additive model, linear in m, nonlinear in € : G(m,€) = Sm +T (é); 
(2.6) additive model, nonlinear in m, linear in € : G(m,€) = S(m) + TE; 


?Note that (2.4) can be viewed as a special form of the Sherman-Morrison-Woodbury formula involving 
Hilbert space operators. 
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(2:7) additive model, nonlinear in both m and é: G(m,€) = S(m) + T(E); 


2.8) nonadditive model, nonlinear in both m and €. 


While the cases (2.5)—(2.7) might be of independent interest, our focus in this article is on the 
general case (2.8). However, items (2.5)—(2.7) do serve to illustrate some of the key challenges. 

In the case of (2.5), one can repeat the steps leading to (2.2), except € will not be Gaussian 
anymore and therefore the distribution of v will not be known analytically. In that case, one 
may obtain a Gaussian approximation to € either by fitting a Gaussian to € or by using a linear 
approximation of 7 (£). Then, one may obtain a Gaussian posterior, where one also relies on 
the linearity of S. On the other hand, in the case of (2.6), the total error v will be Gaussian as 
before, but due to nonlinearity of S(m) the posterior will not be Gaussian. The latter leads to 
one of the fundamental challenges in OED of nonlinear inverse problem—defining a suitable 
OED objective whose optimization is tractable. The more complicated cases of (2.7)—(2.8) 
inherit the challenges corresponding to the previous cases. 

In the rest of this article, we build on the BAE approach to incorporate the uncertainty 
in € in inverse problems governed by nonlinear models of the type (2.8). The uncertainty in € 
will be assumed irreducible, and in general, € will not be assumed to follow a Gaussian law. 
All that we require is the ability to generate samples of €. Following the BAE approach, we 
approximate the approximation error € with a Gaussian. This enables incorporating the model 
uncertainty in the data likelihood; see Section 3. To cope with non-Gaussianity of the posterior, 
we rely on a Gaussian approximation to the posterior, to derive an uncertainty aware OED 
objective that is tractable to evaluate and optimize for infinite-dimensional inverse problems; 
see Section 4 for the definition of the OED objective and Section 5 for computational methods. 


3. Infinite-dimensional Bayesian inverse problems under uncertainty. We consider the 
inverse problem of inferring a parameter m from a model of the form (1.1), where G : M xX — 
Rº is a parameter-to-observable map that in general is nonlinear in both arguments. We focus 
on problems where G is defined as a composition of an observation operator and a PDE 
solution operator. The model G is assumed to be Fréchet differentiable in m, at £ = É, where 
€ € X is a nominal value. As before, we employ a Gaussian noise model, n ~ N (0, TP noise), à 
Gaussian prior law fpr = N (Mpr, Cpr) for m, and assume 77 is independent of m and é. The 
prior induces the Cameron—Martin space É = range(C} 2, which is endowed with the inner 
product [15,17] 

(a, b)g = (Cad nbES, 


where (-,:)y is the inner product on the parameter space M. 

To account for model uncertainty (due to £) in the inverse problem, we rely on the BAE 
approach [24,25], which we explain next. We fix the secondary parameter to € and consider 
the approximate (also known as inaccurate, reduced order, or surrogate) model 


(3.1) F(m) = G(m, 8). 


As mentioned before, this is typically what is done in practice where the secondary model 
parameters are fixed at some (possibly well-justified) nominal values. In the BAE approach, 
we quantify and incorporate errors due to the use of this approximate model in the Bayesian 
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inverse problem analogously to what was done in Section 2.2. Namely, we consider 


(3.2) y = F(m) + G(m, £) — F(m) +n = F(m) + v(m,6), 
— eee” 


approximation error € 


where v(m, €) = e(m, €) +7 is the total error. In the BAE framework, the approximation error 
€, is approximated as a conditionally Gaussian random variable. That is, the distribution of 
e|m is assumed to be a Gaussian. In the present work, we employ the so called enhanced error 
model [10,24,26], that ignores the correlation between € and m and approximates the law of 
e as a Gaussian € ~ N (eo, I6) with 


Es f f e(m, £) Hpr(dm) ue(dê), 
M JX 


(3.3) 
To= | f (etmm,g) e0) (elm, £) ~ eo)" (dm) melde). 
M SX 

In practice, the mean and covariance operator of € are computed via sampling; see Section 5. 
With these approximations, and by our assumption on the measurement noise (which is in- 
dependent of the parameters), the total error v is modeled by a Gaussian N (eo, Tp), where 
DP, = T; + Thoise. Using this approximate noise model along with the approximate model F 
we arrive at the following data likelihood: 


: (y — F(m) eo) T7 (y — F(m) — eo) +. 


(3.4) TRE (ulm) x exp { = 5 


With the prior measure in place, and using this BAE-based data likelihood, we can state the 
Bayes formula [36], 
dHPost 
Apr 
To make computations involved in design of large-scale inverse problems tractable, we rely 
on a local Gaussian approximation to the posterior. Namely, we use Pe cst =N (mË, aps Cost) 
where mYap is the maximum a posteriori probability (MAP) point and C¥,., is an approximate 


post 
posterior covariance operator, described below. The MAP point is given by 


x Mike (Ym). 


(u — F(m) — eo) "T7 (y — F(m) — eo) + 5 (m = mpr, m — mpe- 


NI = 


Map = arg min J (m) := 
mes 


y 


For the approximate posterior covariance operator Croats 


we use 
dera -~1\-1 
Cat = (Fant) E; rm ie) Et] , 


where Fm(M#iap) is the the Fréchet derivative of F evaluated at map. 


4. A-optimal experimental design under uncertainty. We consider inverse problems in 
which measurement data are collected at a set of sensors. In this case, the OED problem seeks 
to find an optimal placement of sensors. Specifically, we formulate the OED problem as that 
of selecting an optimal subset from a set of candidate sensor locations, which is a common 
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approach; see e.g., [3, 20,37]. To make matters concrete, we begin by fixing a set of points 
{x1,%2,...,%n,} that indicate the candidate sensor locations. We then assign a binary weight 
w; to each candidate location x;; a weight of one indicates that a sensor will be placed at 
the corresponding candidate location. The binary vector w € {0,1}"s thus fully specifies an 
experimental design in the present setting. 


4.1. Design of the Bayesian inverse problem. The design w enters the formulation of 
the Bayesian inverse problem through the data likelihood. This requires additional care in the 
present work because the total error covariance matrix T, is non-diagonal. We follow the setup 
in [30] to incorporate w in the Bayesian inverse problem formulation. For a binary design vector 
w € {0,1}"s, we define the matrix P as submatrix of diag(w) with rows corresponding to the 
zero weights removed. Thus, given a generic measurement vector d € R”s, P,,d returns the 
measurements corresponding to active sensors. For d € R?s, we use the notation dy = Pud. 

Next, we describe how a design vector w enters the Bayesian inverse problem, within the 
BAE framework. For a given w, we consider the model y,, = Py(F(m)+v). Using this model 
leads to the following, w-dependent, data likelihood 


1 T 
(4.1) Hf (wlm) x exp { — 5(y — F(m) — eo)" B(w)(y — F(m) — eo) |, 
with 
(4.2) B(w) =P Pe, where Dyw BP. 


gr a we obtain the following w-dependent Gaussian approximation to the pos- 
terior, Ppost = =N (mite, Ce) where m% p is obtained by minimizing 


(4.3)  Tw(m;y) = sly - F(m) — €9)"3(w)(y — F(m) — so) + “mn Mp Moda, 
and 
(4.4) Cho = (Fm) E(w) Fim (mie) + Co) 


Note that in practical computations, the cost function Jw in (4.3) will be implemented as 


(Yw —PyF(m)— Pweo) Tod, (te —PyF(m)— Puto) +5 (m — mpr, M — More, 


Nile 


Tu (m;y) = 


with D,w as in (4.2). This amounts to using the data from the active sensors and removing 
the rows and columns corresponding to inactive sensors from the error covariance matrix T,. 


4.2. The design criterion. In the present work, we follow an A-optimal design strategy, 
where the goal is to find designs that minimize the average posterior variance. Generally, 
computing the average posterior variance for a Bayesian nonlinear inverse problem is compu- 
tationally challenging. We follow the developments in [4] to define a Bayesian A-optimality 
criterion in the case of nonlinear inverse problems. 

Given a data vector y € R”s, an approximate measure of posterior uncertainty is provided 


by (San However, when solving the OED problem data is not available. Indeed, it is the 
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goal of the OED problem to specify how data should be collected. To overcome this, we follow 
the general approach in Bayesian OED of nonlinear inverse problems, where we consider 
By {tC with E, denoting expectation with respect to the set of all likely data. We 
compute this expectation by using the information available in the Bayesian inverse problem 
and the information regarding the distribution of the model uncertainty. Namely, following 
the approach in [4], we use the design criterion 


(as) ow) = ff f CR) Toseta, E) — y)dy tipe(dem) (te), 


where [noise is the probability density function (pdf) of the noise distribution M (0, Poise). In 
practice, the design criterion (4.5) will be computed via sample averaging. Specifically, we use 


(4.6) De a) 


where the training data samples y$, are given by y$, = Pw(G(m', €’)+n'), with { (mt, €, ni) yic 
a sample set from the product space (M, upr) ® (X, we) O (R™,N (0, Phoise)). In large-scale 
applications, typically a small ng can be afforded. However, in this context, typically a mod- 
est na enables computing good quality optimal designs. This is also demonstrated in the 
computational results in the present work. 

Note that, for each i € {1,...,ng}, 


oO 


Yu DU 
tr(CUa) = Coe Ej, ej) 4 
j=1 


where {e;}72, is a complete orthonormal set in M. Inserting this in (4.6) and using the 


definition of em. we have 
(4.7) 1356 Cis €j) 
i=1 j=1 
where, for i € {1,..., na} and j EN, 
(4.7b) me, = arg min Jy(m;y'), 
(4.7c) (Fin (mithp)*(w)Fim (map) + e =e). 


Computing the OED objective as defined above is not practical. However, (4.7) provides 
insight into the key challenges in computing the OED objective. In the first place, (4.7b) is a 


challenging PDE-constrained optimization problem whose solution is the MAP point mu. 
Moreover, upon discretization, (4.7c) will be a high-dimensional linear system with the dis- 
cretization of the operator (Fm (miar) D(w)Fim (miar) +C) as its coefficient matrix. Such 
systems can be tackled with Krylov iterative methods that require the application of the coef- 
ficient matrix on vectors. In Section 5, we present two approaches for efficiently computing the 
OED objective: one approach uses randomized trace estimation and the other utilizes low-rank 
approximation of Fm(m¥g p)“ E (w)Fm(m%8 p). These approaches rely on scalable optimization 
methods for (4.7b) as well as adjoint based gradient and Hessian apply computation. 
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4.3. The optimization problem for finding an A-optimal design. We state the OED 
problem of selecting the best K sensors, with K < ns, as follows: 


in © 
a na (w) 


(4.8) ns 


This is a challenging binary optimization problem. One possibility is to pursue a relaxation 
strategy [3, 4,30] to enable gradient-based optimization with design weights w; € [0,1]. A 
practical alternative is to follow a greedy approach to find an approximate solution to (4.8). 
Namely, we place sensors one at a time. In each step of a greedy algorithm, we select the 
sensor that provides the largest decrease in the value of the OED objective d,,. While the 
solutions obtained using a greedy algorithm are sub-optimal in general, greedy approaches 
have shown good performance in many sensor placement problems; see, e.g., [5,23, 28,33]. In 
the present work, we follow a greedy approach for finding approximate solutions for (4.8). The 
effectiveness of this approach is demonstrated in our computational results in Section 7. 


5. Computational methods. We begin this section with a brief discussion on computing 
the BAE error statistics in Section 5.1. We then detail our proposed methods for computing 
the OED objective in Section 5.2. The greedy approach for computing optimal designs is 
outlined in Section 5.3. Then, we discuss the computational cost of OED objective evaluation, 
using our proposed methods, as well as the greedy procedure in Section 5.4. 


5.1. Estimating approximation error statistics. As discussed in Section 3, the mean and 
covariance of the approximation error in (3.3) will be approximated via Monte Carlo sampling. 
Specifically, we begin by drawing samples {(m’, €")}"5, in (M x X, pr E ug) and compute 


1= 


Nme Nme 


a 1 do ae te co , S , 
Soe’, a eG Eo)(e’—€o)', where et = G(m',€)-F(m'). 


mc ~ . 
1=1 


Subsequently, we use €o = £ and T, = T.. The computational cost of this process is 2Nnme 
model evaluations. Note, however, that these model evaluations are done a priori, in parallel, 
and can be used for both the OED problem and solving the inverse problem. Typically only 
a modest sample size is sufficient for approximating the mean and covariance of the model 
error [10,32]. Specifically, as alluded to in Section 2, only dominant modes of the error covari- 
ance matrix need to be resolved. Note also that (a subset of) the model evaluations in (5.1) 
may be reused in generation of training data needed for computation of the OED objective. 


5.2. Computation of the OED objective. In this section, we present scalable computa- 
tional methods for computing the OED objective. Specifically, we present two methods: (i) 
a method based on the use of randomized trace estimators (Section 5.2.1) and (ii) a method 
based on low-rank spectral decompositions (Section 5.2.2). As discussed further below, the lat- 
ter is particularly suited to our overall approach in the present work. Therefore, we primarily 
focus on the method based on low-rank spectral decompositions, which we also fully elaborate 
in the context of a model inverse problem (see Section 6 and Section 7). However, the first 
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method does have its own merits and is included to provide an alternative approach. The 
relative benefits and computational complexity of these methods are discussed in Section 5.4. 

Before we proceed further, we define some notations that simplify the discussions that 
follow. For i € {1,...,na}, we define 


(5.2) Hi (w) = Fin (mex,)*D(w)Fim(mexp) and H'(w) = CPH (w). 


Note that the operator H’ is the so-called Gauss-Newton Hessian of the data mistfit term in 
the definition of Jw(m; yt) in (4.3), evaluated at m¥8 p. 


5.2.1. Monte Carlo trace estimator approach. We begin by deriving a randomized 
(Monte Carlo) trace estimator for the posterior covariance operator (4.4). This is faciliated 
by the following technical result. 


symtt 


Proposition 5.1. Let C € L/” (M) and K € £(M), and consider the Gaussian measure 
u=N(0,C) on M. Then, fy (Kz, 2) 4 (dz) = tr(C!/2Kc1/2), 


Proof. By the assumptions on C and K, we have that both CK and C!/2KC!/? are trace- 
class. Also, by the formula for the expectation of a quadratic form in the infinite-dimensional 
Hilbert space setting [2, Lemma 1], we have that fy (Kz, z}ų (dz) = tr(CK). To finish the 
proof, we let {e;}92, be the (orthonormal) basis of the eigenvectors of C with corresponding 
(real, positive) eigenvalues (A;J +, and note tr(CK) = $`; (CKe;, ej) 4 = Xy Ai Kej, eilu = 


D (KC! ej, Ca = X; (CRC Pes, eu = tr(C!/2KC1/2). = 
Next, note that we can write E as 


a ~. 


(5.3) CB, = (Fam) E(w) Fin (me p) +C) = CH (Hi(w) + 1). 


post = 


Using this and letting C = Cpr and K = (Hi(w) + j~ in Proposition 5.1, we have tr(C22,) = 
Ta (Hw) + nt, 2) q (dz). Approximating the integral on the right hand side via sam- 


pling, we obtain 


i 1 e = 
r(e = — 5 (Hi) + 1) "23, 27) 9s 
n 


where {z;}/", are draws from the measure u = N(0,Cpr). This randomized (Monte Carlo) 
trace estimator allows approximating the OED objective (4.6), as follows. 


1 Na Ntr 
t 
(5.4a) n (w) = E » 2 (cij, Zilu 
1=1 9=1 
where, for i € {1,..., na} and je {1,..., ner}, 
(5.4b) mes, = arg min Jw(m;y'), 
m 


(5.4c) (H'(w) + 1) Cig = Žj. 
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The major computational challenge in computing DJ (w) is solving the MAP estimation 
problems in (5.4b). These inner optimization problems can be solved efficiently using an 
inexact Newton-Conjugate Gradient (Newton-CG) method. The required first and second 
order derivatives can be obtained efficiently using adjoint-based gradient and Hessian apply 
computation. The Hessian solves in (5.4c) are also done using (preconditioned) CG. This only 
requires the action of H’(w) on vectors, which can be done using the adjoint method. A 
detailed study of the computational cost of computing dir (w) is provided in Section 5.4. 


5.2.2. Low-rank approximation approach. Due to the use of finite-dimensional observa- 
tions, H’ in (5.2) has a finite-dimensional range. Also, often H’ exhibits rapid spectral decay 
and, in cases where the dimension of measurements is high, the numerical rank of this operator 
is typically much smaller than its exact rank. Note that the exact rank is bounded by the 
measurement dimension. Such low-rank structures are due to possible smoothing properties 
of the forward operator and that of Cpr; see, e.g., [12]. The following technical result facilitates 
exploiting such low-rank structures to compute the OED objective. 


Proposition 5.2. Let C and A be in LẸ” (M). Letting (vi), be the orthonormal basis 
of eigenvectors of A with corresponding (real non-negative) eigenvalues {Ax }72,, we have 


oO 


-1)_ 1/2) 
tr(CU + A) ) = tr(C) — Der ER a or vel 
Proof. The result follows by noting that 


tr(C) — tr(C(I HAD =5 Convida — X (CU + Ao Un) 
k= 


= 1 
(Cor vida — >, (Cor, Uk) 


k=1 ma + AR 
E oo Ak oo 12y, 


Next, we consider H'(w) and let {(Àik, vik) Hi be its dominant eigenpairs. (The de- 
pendence of eigenvalues and eigenvectors on w is suppressed, for notational convenience.) 
Using Proposition 5.2 with C = Cpr and A = H’ (w), we obtain the approximation 

Wo g= = (C(I ER C) 
(5.5) pon á 


= tr(Cpr(Z + H’(w)) 7!) ~ tr(Cpr) — 2 Cof» 


2 
sk llu 


E 


Observe that only the second term in the right hand side depends on w. This term, which 
provides a measure of uncertainty reduction, can be used to define the OED objective. Specif- 
ically, using (4.6) along with (5.5), leads to following form of the OED objective: 


(5.6a) P(e =e pda Ieee 
di=1 k=1 
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where, for i € (1,...,na), 


(5.6b) mB, = arg min Jw(m;y'), 
(5.6c) Hi (w) vie = MpVik k €{1,...,r}, 
(5.6d) (Vik, Vil) 4 = Opi k,l E€ Tlr}. 


Note that the numerical rank of H' will, in general, depend on i. In (5.6), we have used a 
common target rank r for simplicity. In the present setting, this target rank is bounded by 
the number of active sensors for a given w. 

As in the case of the approach outlined in Section 5.2.1, the dominant computational 
challenge in computing ®7,4(w) is the solution of the MAP estimation problems (5.6b). This 
will be tackled using the same techniques. The eigenvalue problem (5.6c) can be tackled 
via Lanczos or randomized approaches. The target rank r, can be selected as the number 
of active sensors. This is suitable in cases where we have a small number of active sensors. 
See Section 5.4 for further details regarding the computational cost of evaluating 04º. 

We point out that if the dominant eigenvalues are simple, the condition (vik, viz) gq = 1 
for all k € {1,...,r}, implies the orthonormality condition (5.6d). This follows from the fact 
that the eigenvectors corresponding to distinct eigenvalues of H’, which belongs to £ aii (M), 
are orthogonal. The simplicity assumption on the dominant eigenvalues is observed in many 
applications where the dominant eigenvalues decay rapidly. Making this simplicity assumption, 
and letting Sik = Coe vite we may also write the eigenvalue problem above as the following 
generalized eigenvalue problem 


H'(w) siz = AC pr Sik» 


(5.7) 
(Sik, Sik)e = 1, 


where à € {1,...,mq} and k € {1,...,r}. This formulation is helpful when describing the 
eigenvalue problem in the weak form, as seen in Section 6.2. In particular, the eigenvalue 
problem (5.7) will be formulated in terms of the incremental state and adjoint equations and 
the adjoint based expression for H’ applies. 


5.3. Greedy optimization. We follow a greedy approach for solving the OED problem. 
That is, we select sensors one at a time: at each step, we pick the sensor that results in the 
greatest decrease in the OED objective value; see Algorithm 5.1. As mentioned before, we use 
the approach in Section 5.2.2 for computing the OED objective. That is, we use the OED 
objective 97, as defined in (5.6). 

Theoretical justifications behind the use of a greedy approach for sensor placement, in 
various contexts, have been investigated in a number of works [23,28,33]. The solution obtained 
using the greedy algorithm is known to be sub-optimal. However, as observed in practice, the 
use of a greedy algorithm is a practical approach and is effective in obtaining near optimal 
sensor placements; see also [5, 29,38]. We demonstrate the effectivness of this approach, in our 
computational results in Section 7. 


5.4. Computational cost of sensor placement. The fth step of the greedy algorithm 
requires ns — £— 1 OED objective evaluations (cf. step 5 of Algorithm 5.1); these can be 
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Algorithm 5.1 Greedy approach for solving the OED problem (4.8) with ®,, = oes i 


Input: The target number of sensors K in (4.8). 
Output: The optimal design vector w. 
1: w10 
2: Teandidate {1, e Ns} 
3: Tactive S 0 
4: for L= 1 to K do 
5: 
6 


Evaluate Ori (w + e;), for all j € Teandidate {e; is the jth coordinate vector in R”=} 
ie argmin ®°'8(w + ej) 
J€Lcandidate 
T: Lactive = Lactive U {ic} 
8: Teandidate $= Teandidate \ {ic} 
9: we Wt ej, 
10: end for 


performed in parallel. It is straightforward to note that placing K sensors using the greedy 
approach requires a total of C(K,ns) := Kns — K(K — 1)/2 OED objective function evalua- 
tions. Therefore, the overall cost of computing an optimal design, in terms of the number of 
PDE solves, using the proposed approach is C(K, ns) times the number of PDE solves required 
in each OED objective evaluation. Thus, to provide a complete picture, in this section, we de- 
tail the cost of OED objective evaluation using the two approaches discussed in Section 5.2. A 
key aspect of both of these approaches is that the cost of OED objective evaluation, in terms 
of the number of required PDE solves, is independent of the dimension of the discretized 
inversion and secondary parameters. 

In the following discussion, the rank of the operators H'(w) in (5.2) plays an important 
role. As mentioned before, the exact rank of these operator is bounded by the number of active 
sensors. Moreover, if the number of active sensors is high, the numerical rank is typically 
considerably smaller than the exact rank. We denote the number of active sensors in a given 
design by Nact = Nact(w). Note that for a binary design vectors w, Nact(w) = ||w||1. 

Cost of evaluating a The most expensive step in evaluating (5.4) is solving the inner 


optimization problems for the MAP points mb, i € {1,...,ng}. In the present work, the 
inner optimization problem is solved via inexact Gauss-Newton-CG with line search. The 
cost of each Gauss—Newton iteration is dominated by the CG solves for the search direction. 
When using the prior covariance operator as a preconditioner, the number of CG iterations is 
bounded by O(nact); see e.g., [22]. Each CG step in turn requires two linearized PDE solves 
(incremental forward/adjoint solves). Hence, the cost of the ng MAP estimation problems, 
in terms of “forward-like” PDE solves, is O(2 x ng X Nnewton X Nact), Where Newton is the 
(average) number of Gauss-Newton iterations. The 2 x mq X ntr solves in (5.4c) are also done 
using preconditioned CG. As noted before, each of these solves requires O(nact) CG iterations. 
To summarize, the overall cost of evaluating (5.4) is bounded by 


(5.8) OM X na X Nnewton X Tact) + O(2 X tha X Te X Mact) é 
s= 


cost of MAP point solves in (5.4b) cost of solves in (5.4c) 
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It is important to note that MAP estimation problems as well as the linear solves in (5.4c) 
can be performed in parallel. 

Cost of evaluating Pn;. As is the case with (5.4), the solution of the MAP estimation 
problems (5.6b) dominates the computational cost of evaluating 94%. This cost was analyzed 
for the case of BIT . We next discuss the cost of solving the eigenvalue problems in (5.6c). We 
rely on the Lanczos method [19] for these eigenvalue problems.” For each i € {1,...,na}, the 
Lanczos method requires O(nact) applications of H! on vectors, each costing two PDE solves. 
Therefore, solving the eigenvalue problems requires a total of O(2 x na X Nact) PDE solves. 
Hence, the overall cost of computing 94,5 is bounded by 


(5.9) O(2 X na X Nnewton X Nact) + OR X Na X Nact) - 


s —ÃÁ— mam 
cost of MAP point solves in (5.6b) cost of solves in (5.6c) 


To sum up, the computational cost of evaluating both Di and os is dominated by the 
cost of solving the MAP estimation problems. However, by exploiting the low-rank structure 
of the operators H’, Pay is more efficient to compute, as it does not require ntr Hessian solves 
(compare also the second terms in (5.8) and (5.9)). Moreover, computing the traces using 
the eigenvalues will be, in general, more accurate than a sampling based approach, as long 
as sufficiently many eigenvalues are used. Note also that when using a greedy approach, nact 
starts at Nact = 1 in the first step and increase by one in each iteration. 

However, the idea of using a randomized trace estimator does have some merits. For one 
thing, typically a small (in order of tens) nt; is sufficient when solving the OED problem. 
Moreover, di is simpler to implement as it does not require solving eigenvalue problems. 
Note, however, that if a large nir is needed, then the cost of Hessian solves in (5.4c) might 
exceed the cost of the MAP estimation problems. 


6. Model problem. Here, we present the model problem used to study our approach for 
OED under uncertainty. We consider a nonlinear inverse problem governed by a linear elliptic 
PDE, in a three-dimensional domain. This problem, which is adapted from [32], is motivated 
by heat transfer applications. In Section 6.1, we detail the description of the Bayesian inverse 
problem under study. Subsequently, we detail the description of the OED objective r$, for 
this specific model problem in Section 6.2. 


Figure 1. Sketch of the physical domain Q and location of candidate sensor locations (blue circles). 


3 Another option is the use of randomized methods [21]. 
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6.1. The Bayesian inverse problem. We consider the following model 


-V-(&Vu)=0 inQ, 


u=0 onTp, 
(6.1) é 
eVun=h only, 


eVun+ePu=0 onTr. 


In this problem, Q C R° is bounded domain with sufficiently smooth boundary r = P5UINUTR, 
where Tp, IN, and TR are mutually disjoint. In (6.1), u is the state variable, m is the inversion 
parameter, and € is the secondary uncertain parameter. The Neumann boundary data is 
h € L?(Ty), n is the unit-length outward normal for the boundaries Ty and TR, and for 
simplicity we have considered zero volume source, and homogeneous Dirichlet and Robin 
conditions. In our numerical experiments, Q is a three-dimensional domain with a unit square 
base and height such that aspect ratio (base/height) is 100; see Figure 1. In the present 
example, Ir is the bottom edge of the domain, IN is the top edge, and Ip is the union 
of the side edges. In our computations, we define the boundary source term h according to 
h(x) = 1 + sin (4r y (z1 — 1)? + (z2 — 1)?). 

Note that in this problem, the inversion and secondary parameters are both functions. The 
inversion parameter belongs to the Hilbert space M = L?(TR), which is equipped with the 
L?(Tg)-inner product. The secondary parameter £ takes values in L?(Q). As discussed below, 
we define € as an L?(Q)-valued Gaussian random variable with almost surely continuous 
realizations, which ensures (almost sure) well-posedness of the problem (6.1). Below we use 
(Jo and (-,-)p, to denote the L?(Q)- and L?(Iy)-inner products, respectively. Also, with a 


slight abuse of notation, we denote the L?(Q)? inner-product with (-,-)g as well. 

We use measurements of u on the top boundary, Iy, to estimate the inversion parameter 
m. The measurements are collected at a set of sensor locations as depicted in Figure 1. Hence, 
in the present problem, the parameter-to-observable is defined as the composition of a linear 
observation operator B, which extracts solution values at the sensor locations, and the PDE 
solution operator. Note that this parameter-to-observable map is a nonlinear function of the 
inversion and secondary parameters. 

As detailed in Section 3, we assume a Gaussian measurement noise model, and use the 
BAE approach to account for the secondary uncertainties in the inverse problem. Also, we 
use a Gaussian prior law for m and model the secondary uncertainty as a Gaussian random 
variable.’ For both m and £, we use covariance operators that are defined as negative powers 
of Laplacian-like operators [36]; see Section 7. Note that with the appropriate choice of the 
covariance operator, the realizations of € will be almost surely continuous on the closure of 2. 


6.2. OED under uncertainty problem formulation. In this section we discuss the precise 
definition of $r, for the model inverse problem discussed in Section 6.1. The MAP estimation 
problems in (5.6b) require minimizing Jw(m; y’) defined in (4.3). For notational convenience, 


in this section, we use the shorthand m* to denote mus. For the present example, the first 


“As discussed earlier, the presented framework does not rely on a specific choice of distribution law for £. 
The Gaussian assumption here is merely for computational convenience. 
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order optimality conditions for this optimization problem are given by 


(EV ui, Vôo — (h, Dr + (iu; B) 4 = 0, Vp E H, 
(6.3) (EVpi, Vida + (ù, pi) 4 + (B*E(w)(Bui — y’ + £0), Do = 0, Vie %, 
(m5 = Mpr, M)g = (me? ui, Di) 4 = 0, Vm E 6, 


where % = {v € HH(9): vn, = 0}. For the derivation details, we refer to [32]. Note that (6.2) 
is the weak form of the state equation (6.1), and (6.3) and (6.4) are the weak forms of the 
adjoint and gradient equations, respectively. Note also that the left hand side of (6.4) describes 
the action of the derivative of Jw(m; yt) in the direction M. 

To specify the eigenvalue problem in (5.6c), we first discuss the action of the operator H’ 
(evaluated at m*) in the direction 7. In weak form, this Hessian application satisfies, 
(6.5) (Hi (mý), my = (Meus, Pi) qo 
for every Mm € &. In (6.5), for a given mj, u; solves the state problem (6.1), p; solves the 
incremental adjoint problem 
(6.6) (EV Pi Vo + (ei, Piu + (B*U(w)Bix,ii)y =0, Vie %, 
and û; solves the so-called incremental state problem 


(6.7) (EV à, VB) + (Chi, Pa + (e™ thu, dr, =0, VBE. 


We next summarize the OED problem of minimizing (5.6) as a PDE-constrained opti- 
mization problem specialized for the model inverse problem in Section 6: 


1 na r A; 
6.8a min —— ik jg. |2 
( ) we{01}"d Na >, a 1+ Aik I iku 


where, for i € {1,...,nq} and k € {1,...,r}, 


(6.8b) (EV us, Vôo — Dr, + (e Ui, B) y =0, vp Eh, 
(6.8c) (EV pi, Vito + (ii, piju + (B*S(w) (Bu; — yi + €0),U)g = 0, Vue %, 
(6.8d) (mi — mpg + (Me™ ui, pi) y = 0, Vin ee, 
(6.8e) (EV Pik, Vig + (eT ù, Bir) y + (B*E(w)Biix, Do = 0, Va € %, 
(6.8f) (meiu; Piu = Aiklsik Me, Vin € &, 
(6.8g) (EV ûik, VB) q + (e tik, Du +(e" Sikti, Da = 9, Vp Eh, 
(6.8h) (Sik, Silos 1. 


The PDE constraints (6.8b)-(6.8d) are the optimality system (6.2)-(6.4) characterizing the 


MAP point mi = mb, described in (5.6b). The equations (6.8e)-(6.8g) are the PDE con- 
straints that describe the Hessian apply and eigenvalue problem. Note that we have reformu- 
lated the eigenvalue problem according to (5.7). 
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7. Computational results. In this section, we numerically study our OED under uncer- 
tainty approach, which we apply to the model inverse problem described in Section 6. We begin 
by specifying the Bayesian problem setup and discretizaton details in Section 7.1. Then, we 
study the impact of secondary model uncertainty on the measurements and compute the BAE 
error statistics in Section 7.2. Finally, in Section 7.3, we examine the effectiveness of our ap- 
proach in computing uncertainty aware designs. We also study the impact of ignoring model 
uncertainty in (i) experimental design stage and (ii) both experimental design and inference 
stages. Ignoring uncertainty in the design stage amounts to fixing the secondary parameter 
€ at its nominal value (i.e., using the approximate model (3.1)) and ignoring the approxima- 
tion error when solving the OED problem. We refer to designs computed in this manner as 
uncertainty unaware designs. Note that in the case of (i), the uncertainty is still accounted 
for when solving the inverse problem. This study illustrates the importance of accounting for 
model uncertainty, when computing experimental designs. On the other hand, the study of 
case (ii) illustrates the pitfalls of ignoring uncertainty in both the optimal design problem and 
subsequent solution of the inverse problem. 


7.1. Problem setup. We consider ns = 100 candidate sensor locations that are arranged 
in a regular grid on the top boundary Iy; see Figure 1. The additive noise in the synthetic 
measurements has a covariance matrix of the form Pyoise = 071, with o = 10-3. This amounts 
to about one percent noise. 

We use a Gaussian prior law pr = N (Mpr, Cpr) for m. The prior mean is taken as a constant 
function mpr = 1 and we use a covariance operator given by the inverse of a Laplacian- 
like operator. Specifically, we let Cpr := A~?, with Alm] = —V - (0Vm) + am, where we 
take 0 = 0.1 and a = 1. To help mitigate undesirable boundary affects that can arise due 
to the use of PDE-based prior covariance operators, we equip the operator A with Robin 
boundary conditions [16]. For illustration, four random draws from the prior distribution are 
shown in Figure 2. The law of the secondary parameter € is chosen to be a Gaussian measure 


3 
2 
1 


Figure 2. Samples of the primary uncertain parameter m. 


= 


o 


ue = N (é ; Ce). We set the mean of € as the constant function É = 0, and the covariance 
operator is defined as Ce := L~”, where L[€] = —V - (OVE) +9E with © = 0.25 diag(1, 1, o) 
and y = 50. This choice of O corresponds to a random field with much shorter correlation in 
the z—direction than in the z- and y-directions, inline with aspect ratio of 100 used in defining 
the domain Q. We show four representative samples of é in Figure 3 . 

The forward problem (6.1) is solved using a continuous Galerkin finite element method. 
We use 9,600 tetrahedral elements and 2,205 piecewise linear basis functions. As such, the 
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Figure 3. Samples of the secondary uncertain parameter £. 
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Figure 4. The effect of the secondary uncertain parameter € on the state u on top surface for the (fixed) 
true primary parameter m and four different realizations of E. 
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discretized state and adjoint variables, as well as the secondary parameter É, have dimension 
2,205. On the other hand, the discretized inversion parameter m, which is defined on the 
bottom boundary, is of dimension 441. 


7.2. Incorporating the model uncertainty in the inverse problem. We begin by studying 
the impact of the secondary parameter on the solution of the forward problem. This is illus- 
trated in Figure 4. In this experiment, we solve the forward problem for a fixed m and four 
different realizations of é. Note that, although the qualitative behavior of u on Ty is similar for 
the different samples of €, there is considerable differences in the values. This indicates that 
the approximation error due to fixing € at the nominal value will have significant variations. 

We compute the sample mean and covariance matrix of the approximation error as in (5.1) 
with nme = 1,000. The mean and marginal standard deviations are shown in Figure 5. To 
illustrate the correlation structure of the approximation error, we show the correlation of the 
approximation error at two of the candidate locations with the other candidate locations in 
Figure 6 (left-middle). To provide an overall picture, we report the entire correlation matrix of 
the approximation error in Figure 6 (right). We see that the approximation errors at the sensor 
sites are not only larger than the measurement noise, but also they are highly correlated and 
have a nonzero (and non-constant) mean. Thus, in the present application, the approximation 
error due to model uncertainty cannot be ignored and needs to be accounted for. 


7.3. Optimal experimental design under uncertainty. We begin by solving the OED 
problem (6.8) with na = 5 training data samples. In Figure 7 (left), we show an uncertainty 
aware optimal sensor placement with 20 sensors. Note that due to the use of a greedy al- 
gorithm, we can track the order in which the sensors are picked. To understand the impact 
of ignoring model uncertainty in the design stage, we also compute an uncertainty unaware 
design; this is reported in Figure 7 (right). 

To evaluate the quality of the computed designs, we first study the expected posterior 
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Figure 5. Left: the mean of the approximation error at the candidate sensor locations; right: the marginal 
standard deviation of the approximation error at these locations (i.e., the square root of diagonal entries of Pe 
given in (5.1)). 
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Figure 6. Left: The correlation of a measurement from a sensor near the middle marked by red dot, with 
the remaining sensors; middle: The same information for the sensor in the top left corner of the sensor grid; 
right: the correlation matrix of the approximation error. 
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Figure 7. The uncertainty aware (left) and uncertainty unaware (right) sensor placements. The numbers 
indicate the order in which the sensors are picked in the greedy algorithms for minimizing (5.6a). 


variance and expected relative error of the MAP point. To facilitate this, we draw parameter 
samples {m} };%; from pr and generate validation data samples, 


diy = Po(G(mi,&) + ni), i {1,..., nv}, 
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where (6H, and (n;), are draws from pg and N (0, Phoise), respectively. Then, we consider 
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z 1 Ny ; B 1 Ny 
V(w) = Gs) and Exap(w) = — >: 
V j=1 V j=1 


In our numerical experiments we use ny = 20. We compare V(w) and Eyar(w) when solving 
the inverse problem with the computed optimal design versus randomly chosen designs in Fig- 
ure 8, where we consider designs with 10, 15, and 20 sensors. We also examine the impact of 
ignoring the model uncertainty in solving the OED problem (see the black dots in Figure 8). 
Note that the uncertainty aware designs outperform the random designs as well as the uncer- 
tainty unaware designs. We also observe that as the number of sensors in the designs increase, 
the cloud moves left and downward. This is expected. The more sensors we use, the more we 
can improve the quality of the MAP point and reduce posterior uncertainty. 
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Figure 8. The expected relative error Emap(w) of the MAP point versus expected posterior variance V (w) 
using random designs (red dots), an uncertainty aware sensor placements (blue dots), and uncertainty unaware 
sensor placements (black dots) using 10, 15, and 20 sensors. 


Next, we illustrate the effectiveness of the computed optimal designs in reducing pos- 
terior uncertainty. To this end, we consider the computed optimal designs with 10 sensors. 
In Figure 9, we show the effectiveness of the uncertainty aware optimal design in reducing 
posterior uncertainty; we also report the posterior standard deviation field, when solving the 
inverse problem with an uncertainty unaware design. To complement this study, we consider 
the quality of the MAP points computed using uncertainty aware and uncertainty unaware 
designs in Figure 10. Overall, we observe that the uncertainty aware design is more effective 
in reducing posterior uncertainty and results in a higher quality MAP point. This conclusion 
is also supported by the results reported in Figure 8. 

Note that the data used to solve the inverse problem is synthesized by solving the PDE 
model (1.1), using our choice of the “truth” inversion parameter (see Figure 10 (left)) and 
a randomly chosen € followed by extracting measurements at the sensor sites and adding 
measurement noise. This simulates the practical situation when field data that corresponds 
to an unknown choice of € is collected. 

In the above experiments, when examining the performance of uncertainty unaware de- 
signs, model uncertainty was still accounted for (following the BAE framework) when solving 
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Figure 9. The effectiveness of the computed optimal design with 10 sensors on posterior uncertainty. We 
report the pointwise prior standard deviation of m (left), and the pointwise posterior standard deviation of m 
using the uncertainty aware (middle) and uncertainty unaware (right) optimal designs. 
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Figure 10. The quality of the MAP points computed using optimal designs with 10 sensors. We show the 
true parameter field (left), the MAP point computed using the uncertainty aware optimal design (middle) and 
using the uncertainty unaware design (right). 


the inverse problem using these designs. Next, we examine the impact of ignoring model un- 
certainty in both OED and inference stages. For this experiment, we use the same synthesized 
data as that used to obtain the results in Figure 10 (right). In Figure 11 we report the re- 
sult of solving the inverse problem with an uncertainty unaware design, when ignoring model 
uncertainty in the inverse problem as well. We note an impressive reduction of uncertainty; 
see Figure 11 (left), which uses the same scale as the standard deviation plots in Figure 9. 
On the other hand, the MAP point computed in this case is of very poor quality; see Fig- 
ure 11 (right), where we have used the same scale as the MAP point plots in Figure 10. 
Intuitively, this indicates that ignoring model uncertainty in both OED and inference stages, 
in presence of significant modeling uncertainties, can lead to an unfortunate situation where 
one is highly certain (i.e., low posterior variance) about a very wrong parameter estimate. 


8. Conclusions. In the present work, we addressed OED under uncertainty for Bayesian 
nonlinear inverse problems governed by PDEs with infinite-dimensional inversion and sec- 
ondary parameters. We have presented a mathematical framework and scalable computational 
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Figure 11. The posterior standard deviation field (left) and MAP point (right) when solving the uncertainty 
unaware inverse problem using an uncertainty unaware design. 


methods for computing uncertainty aware optimal designs. Our results demonstrate that ig- 
noring the uncertainty in the OED and/or the parameter inversion stages can lead to inferior 
designs and inaccurate parameter estimation. Hence, it is important to account for modeling 
uncertainties in Bayesian inversion and OED. 

The limitations of the proposed approach are in its reliance on Gaussian approximations 
for the posterior and the approximation error. The former is a common approach in large-scale 
Bayesian inverse problems as well as in the BAE literature. The Gaussian approximation to 
the posterior is suitable if a linearization of the forward model, at the MAP point, is sufficiently 
accurate for the set of parameters with significant posterior probability. On the other hand, the 
Gaussian approximation of the approximation error, which is guided by the BAE approach, 
might fail to adequately capture the distribution of the approximation error. However, as 
shown in various studies in the BAE literature, this Gaussian approximation is reasonable in 
broad classes of inverse problems; see, e.g., [24, 26,31,32]. Additionally, in the present work 
we considered a greedy approach for tackling the binary OED optimization problems. This 
can become expensive if the number of candidate sensor locations or the desired number of 
sensors in the computed sensor placements become very large. 

The discussions in this article lead to a number of opportunities for future work. In the first 
place, we note that the use of BAE approach to account for modeling uncertainties in Bayesian 
inverse problems is only one possible use case of this approach. In general, BAE has been used 
in a wide range of applications to account for uncertainty due to the use of approximate 
forward models. For example, BAE can be used to model the approximation errors due to the 
use of reduced order models or upscaled models. BAE has also been used to account for the 
errors due to the use of a mean-field model instead of an underlying high-fidelity stochastic 
model; see, e.g., [34]. Hence, the framework presented for OED under uncertainty in the present 
work can be extended to OED in inverse problems where the forward model is replaced with 
a low-fidelity approximate model instead of a computationally intensive high-fidelity model, 
as long the approximation error can be modeled adequately by a Gaussian. 

Another interesting line of inquiry involves replacing the greedy strategy used to find 
optimal designs with more powerful optimization algorithms. One possibility is to follow a 
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relaxation approach [3, 4,8,30] where the design weights are allowed to take values in the 
interval [0, 1]. This enables use of efficient gradient-based optimization algorithms and can be 
combined with a suitable penalty approach to control the sparsity of the computed sensor 
placements. An attractive alternative is the approach in [9], which tackles the binary OED 
optimization problem by replacing it with a related stochastic programming problem. A further 
line of inquiry is investigating the idea of using fixed MAP points, computed prior to solving 
the OED problem, as done in [38] for OED problems with no additional uncertainties. This 
idea, at the expense of further approximations, replaces the bilevel optimization problem with 
a simpler one, hence reducing computational cost significantly. 
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